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Abstract.  Due  to  email’s  ubiquitous  nature,  millions  of  users  are  intimate  with  the 
technology.  However,  most  users  are  only  familiar  with  managing  their  own  email,  which 
is  an  inherently  different  task  than  exploring  an  email  archive.  Historians  and  social 
scientists  believe  that  email  archives  are  important  artifacts  for  understanding  the 
individuals  and  communities  they  represent.  In  order  to  understand  the  conversations 
evidenced  in  an  archive,  context  is  needed.  In  this  paper,  we  present  a  new  way  to  gain 
this  necessary  context:  analyzing  the  temporal  rhythms  of  social  relationships.  We 
provide  methods  for  constructing  meaningful  rhythms  from  the  email  headers  by 
identifying  relationships  and  interpreting  their  attributes.  With  these  visualization 
techniques,  email  archive  explorers  can  uncover  insights  that  may  have  been  otherwise 
hidden  in  the  archive.  We  apply  our  methods  to  an  individual’s  fifteen-year  email  archive, 
which  consists  of  about  45,000  messages  and  over  4,000  relationships. 


Introduction 

Since  1971,  email  has  grown  rapidly  in  popularity  and  has  become  a  central  part 
of  many  users’  personal  and  professional  lives.  Despite  its  impressive  role  in 
society,  there  are  still  few  tools  available  to  explore  archives  of  email.  The  need 
for  such  tools  will  grow  as  valuable  email  archives  increase  in  availability.  The 
U.S.  National  Archives  preserves  emails  as  government  records  (Baron,  1999),  a 
recently  released  collection  of  Enron  emails  has  attracted  significant  public 
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Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 
VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 
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attention  (Grieve,  2003),  and  some  individuals  have  now  accumulated  email 
collections  that  span  decades.  Historians  and  social  scientists  will  undoubtedly 
find  these  archives  to  be  a  valuable  basis  for  understanding  the  individuals  and 
organizations  that  created  them.  However,  it  is  currently  far  from  clear  how  these 
explorers  will  gain  the  context  they  need  to  understand  the  archive’s  numerous 
conversations. 

Figure  1  illustrates  one  way  in  which  the  universe  of  tools  for  interacting  with 
online  conversations  can  be  subdivided.  Email  is  created  by  individuals,  and 
often  in  some  organizational  or  social  context.  There  has  been  a  great  deal  of 
work  on  individual  and  organizational  email  productivity  tools  (regions  A  and  B), 
and  on  the  management  and  analysis  of  conversations  in  public  email  venues  such 
as  mailing  lists  and  Usenet  News  (regions  C  and  F).  Our  work  in  this  paper 
focuses  on  region  D,  as  we  present  new  techniques  for  exploring  the  archived 
email  of  an  individual. 
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Figure  1.  Types  of  interactions  with  email  collections. 


Although  the  principal  content  of  email  is  free  text,  when  attempting  to  browse 
archives,  the  shortcomings  of  a  text-only  display  become  clear.  Email  archive 
explorers  have  previously  tackled  the  archives  by  keyword  searching,  but  this 
approach  will  often  result  in  losing  a  conversation’s  context  (Donath,  2004). 
Visualizations  are  one  way  to  provide  this  missing  context.  In  this  paper,  we 
show  that  valuable  information  can  be  uncovered  by  visualizing  the  temporal 
rhythms  of  social  relationships  that  are  evidenced  in  email  archives.  Each 
relationship  that  is  evidenced  in  an  email  archive  has  a  rhythm  that  can  be 
characterized  by  the  intensity  of  the  correspondence  over  time.  Relationships  that 
are  brief  but  intense  have  rhythms  with  sharp  growth  and  steep  decline. 
Relationships  that  are  durable  and  strong  have  consistent  and  continuing  rhythms. 
This  paper  presents  insights  achieved  by  analyzing  the  rhythms,  which  help 
archive  explorers  question  why  certain  relationships  start  and  stop,  why  certain 
relationships  share  similar  activity  patterns,  and  the  nature  of  the  relationships 
that  yield  different  interaction  patterns. 

Detecting  long-term  rhythms,  our  focus  in  this  paper,  requires  a  collection 
spanning  many  years.  Ben  Shneiderman,  a  co-author  of  this  paper  and  a  pioneer 


in  the  fields  of  Human-Computer  Interaction  (HCI)  and  Information 
Visualization,  has  archived  the  emails  he  produced  and  received  since  1984.  The 
archive  portrays  over  4,000  of  Shneidennan’s  relationships,  totaling  around 
45,000  messages.  That  archive  spans  a  longer  period  than  any  other  collection 
that  was  available  to  us  when  we  started  this  work,  offering  us  a  unique 
opportunity  to  study  the  long-term  rhythms  of  relationships  present  in  a  real  email 
collection.  In  the  next  section,  we  review  related  work  on  interacting  with  online 
conversations.  Next,  we  define  what  we  mean  by  “relationships”  and  the 
“rhythms”  that  they  produce.  We  then  present  our  analysis  methods  and  illustrate 
the  use  of  those  methods  on  the  Shneidennan  archive.  Finally,  we  conclude  with 
some  suggestions  for  future  work. 


Previous  Work 

In  this  section  we  briefly  review  prior  work  on  email  management,  organizing  the 
discussion  using  the  task  decomposition  shown  in  Figure  1 .  Interaction  with  the 
user’s  own  current  email  (Region  A)  is  by  far  the  most  actively  studied  email 
management  task  in  the  research  literature.  An  early  ethnographic  study  by 
Mackay  in  1988  provided  compelling  evidence  that  different  people  deal  with 
large  quantities  of  their  personal  email  in  many  different  ways  (MacKay,  1988). 
Whittaker  and  Sidner’s  later  study  resulted  in  the  same  conclusion,  while  also 
describing  tasks  that  individuals  use  email  for  beyond  the  asynchronous 
communication  for  which  it  was  designed  (Whittaker  and  Sidner,  1996).  Recent 
attempts  to  integrate  visualizations  into  email  clients  seek  to  help  users  better 
manage  their  email.  For  example,  enabling  users  to  see  the  thread  structure 
provides  them  with  a  better  understanding  of  the  how  conversations  evolve  over 
time  (Kerr,  2003;  Venolia  and  Neustaedter,  2003).  Another  example  is  the 
Remail  project,  which  provides  a  “correspondents’  map”  that  allows  users  to 
quickly  see  who  they  haven’t  replied  to  recently,  as  well  as  a  “message  map”  to 
see  messages  with  similar  attributes  (Rohall  et  ah,  2003). 

Some  recent  projects  have  investigated  exploration  of  personal  email  archives 
to  uncover  trends  and  patterns  (Region  D).  PostHistory  explored  email  archives 
that  extend  as  far  back  as  five  years,  seeking  to  support  the  development  of 
insights  that  would  be  socially  relevant  to  the  owner  of  the  email  (Viegas  et  ah, 
2004).  PostHistory  featured  an  interface  that  animates  over  time  to  allow  users  to 
get  a  sense  for  their  steady  and  intense  relationships,  and  to  illustrate  fast-paced 
rhythms  (e.g.,  resulting  from  project  deadlines)  and  slower-paced  rhythms  (e.g., 
during  vacations).  Social  Network  Fragments,  by  contrast,  focused  on  revealing 
groups  of  correspondents  that  emerge  through  email  exchanges  (Viegas  et  ah, 
2004).  This  interface  also  used  time  as  a  dimension  to  see  how  connections 
among  correspondents  appear  and  dissolve,  thereby  providing  a  way  for  the  user 
to  visualize  the  evolution  of  their  own  social  network.  In  small  studies,  users 


were  able  to  see  meaningful  patterns  with  both  PostHistory  and  Social  Network 
Fragments,  sometimes  using  the  visualization  as  instigation  for  telling  stories. 

The  ubiquity  and  persistence  of  email  has  important  consequences  for  the 
management  of  information  within  organizations  (Region  B).  Ducheneaut  and 
Bellotti  studied  the  use  of  email  in  three  organizations,  and  discovered  that 
patterns  of  email  use  vary  with  individual  roles  within  those  organizations 
(Ducheneaut  and  Bellotti,  2001).  They  also  noted  that  characteristics  of  each 
organization  influenced  the  ways  in  which  people  used  and  organized  their  email 
collections.  Tyler  and  Tang  added  to  the  understanding  of  email  use  within 
organizations,  observing  that  responsiveness  patterns  vary  in  ways  that  reflect  the 
dynamics  of  interpersonal  relationships  within  an  organization  (Tyler  and  Tang, 
2003).  That  observation  led  them  to  suggest  that  tools  for  estimating  expected 
response  latency  could  help  users  detect  communication  breakdowns.  Another 
example  of  an  organization  tool  is  the  “Email  Mining  Toolkit,”  developed  by  Li 
et  al.  to  support  anomaly  detection  by  creating  behavior  models.  They  then  used 
these  models  to  detect  aberrant  behavior  of  individuals  or  groups  that  may 
indicate  abuse  or  policy  violations  (Li  et  al.,  2004). 

Exploration  of  archived  collections  of  organizational  email  has  also  been 
studied  (Region  E).  Tyler  et  al.  used  the  social  network  analysis  concept  of 
“betweenness  centrality”  to  identify  communities  in  a  large  collection  of  email 
from  a  single  organization,  discovering  that  evidence  of  the  management 
hierarchy  for  that  organization  could  be  found  in  the  structure  of  the  resulting 
graph  (Tyler  et  al.,  2003).  Leuski’s  “eArchivarius”  system  combined  clustering 
based  on  content  or  co-addressing  with  activity  timelines  and  biographies  to 
explore  activities  in  the  U.S.  National  Security  Council  during  the  Reagan  era 
using  a  small  collection  of  declassified  email  messages  (Leuski,  2003). 

Usenet  News,  a  distributed  management  system  for  a  large  collection  of  public 
mailing  lists,  has  been  archived  since  1981.  Mailing  list  usage  differs  somewhat 
from  the  use  of  personal  email,  both  because  privacy  expectations  are  reduced 
and  because  the  group-oriented  communication  structure  alters  interaction 
patterns.  Smith  used  the  “NetScan”  system  to  study  social  accounting  metrics  for 
Usenet  participation  (Region  F)  and  reported  statistics  on  authorship  and  on 
activity  over  time  (Smith,  1999;  Smith,  2002).  Usenet  News  is  immediately 
available  to  both  participants  and  nonparticipants  (“lurkers”),  which  makes  the 
distinction  between  management  and  exploration  somewhat  less  defined  than  it  is 
in  the  case  of  individual  and  organizational  email.  Users  of  the  NetScan  system 
can,  for  example,  use  it  to  find  intense  discussions  and  related  “newsgroups” 
(Region  C).  Sack’s  “Conversation  Map”  also  explored  Region  C,  focusing  on  the 
structure  of  long-term  conversations  by  using  social  network  diagrams,  lists  of 
discussion  themes,  and  semantic  network  representations  to  support  visualization 
of  conversational  structure  and  content  (Sack,  2000). 


The  work  described  in  this  section  is,  of  course,  only  a  small  sample  of  the 
extensive  research  on  email  utilization  that  has  been  reported  since  the  first  email 
was  sent  over  the  ARPANET  in  1971.  Looking  broadly  at  that  body  of  work, 
however,  two  trends  emerge.  First,  the  vast  majority  of  the  reported  research  has 
focused  on  managing  current  activities  rather  than  on  understanding  what 
happened  in  the  past.  There  has  been  much  less  work  done  in  Regions  D  and  E. 
That  makes  sense,  since  only  recently  has  email’s  ubiquity  become  clear  and 
archives  of  email  are  accruing.  Second,  the  retrospective  analyses  on  individual 
email  (as  opposed  to  mailing  lists  or  Usenet  News)  that  have  been  done  have  had 
limited  scope;  we  are  aware  of  only  one  study  that  has  looked  at  even  five  years 
of  email.  In  this  paper,  we  take  a  longer  view,  looking  back  at  a  fifteen-year 
period  that  spans  1984-1998,  as  Internet  email  moved  from  adolescence  to 
adulthood. 


Relationships  in  Email  Archives 

In  this  section,  we  describe  the  email  collection  that  we  worked  with  and  the 
analytical  framework  that  we  applied  to  explore  the  long-term  rhythm  of 
relationships  in  that  collection. 

The  Shneiderman  Archive 

This  archive  begins  in  1984;  one  year  after  Ben  Shneiderman  received  tenure  as 
an  Associate  Professor  and  founded  the  Human-Computer  Interaction  Lab  at  the 
University  of  Maryland.  We  chose  to  limit  our  study  to  the  first  fifteen  years, 
culminating  in  1998,  because  Shneidennan  changed  his  email  file  structure 
significantly  in  1999.  The  resulting  set  includes  44,971  messages.  That  is 
certainly  not  every  email  received  or  sent  by  Shneiderman  during  that  period. 
Rather,  it  includes  those  that  Shneiderman  purposefully  stored.  Although 
analysis  of  the  results  of  intentional  retention  will  not  provide  a  complete  picture 
of  an  individual’s  email  traffic,  it  does  serve  to  filter  out  spam  and  other  less 
significant  messages.  The  saved  email  gives  historians  a  picture  of  what 
Shneiderman  felt  at  the  time  were  the  significant  conversations  in  his  professional 
life.  However,  our  analysis  will  miss  some  subtle  and  friendly  exchanges,  which 
could  also  serve  as  sources  of  interesting  rhythms  (e.g.,  as  described  by  (Tyler 
and  Tang,  2003)). 

Relationships 

Email  provides  a  medium  in  which  users  may  foster  relationships  with 
individuals,  organizations,  and  a  global  community.  Relationships  are 
fundamental  to  any  form  of  human  interaction,  so  we  have  chosen  to  aggregate 


this  collection  by  relationship  rather  than  the  more  commonly  studied 
granularities  of  “threads”  (i.e.,  reply  chains)  or  individual  messages.  Aggregation 
into  relationships  facilitates  exploration  by  masking  some  sources  of  variation 
(e.g.,  multiple  email  addresses  for  a  single  individual  or  individuals  that 
participate  in  multiple  relationships)  that  might  otherwise  conceal  the  broad 
themes  that  we  wish  to  uncover.  By  “relationships”  we  mean  a  set  of 
conversations  over  time  that  reflects  a  type  of  interaction  that  was  meaningful  to 
the  person  that  created  the  email  archive.  Examples  could  include  conversations 
with  a  specific  colleague,  discussion  of  a  particular  topic  (e.g.,  academic 
governance)  involving  several  members  of  an  organization,  or  a  group  of 
messages  regarding  the  planning  of  an  event  (e.g.,  a  professional  conference). 

The  process  of  discovering  unique  identities  in  an  email  archive  is  not  trivial, 
especially  when  dealing  with  an  archive  than  spans  fifteen  years.  People  move  to 
various  organizations  and  universities,  obtain  new  email  addresses,  change  their 
surnames,  and  evolve  their  academic  interests.  For  this  reason,  individuals  are 
not  classified  simply  based  on  their  email  header  information.  Instead,  each 
relationship  is  identified  with  help  from  Shneiderman’s  filing  metadata,  as  he 
typically  stored  relationships  in  separate  folders.  Conversations  with  individuals 
are  usually  stored  in  a  folder  labeled  with  their  name.  If  conversations  occurred 
with  many  participants  on  a  particular  topic,  such  as  organizing  a  conference, 
these  are  usually  stored  in  a  folder  labeled  with  a  description  of  the  topic. 

We  were  interested  in  applying  our  techniques  to  learn  about  Shneiderman’s 
professional  life,  and  not  his  personal  life.  In  the  archive,  there  were  several 
relationships  present  that  did  not  include  any  content  related  to  his  professional 
career.  These  relationships  include  his  family,  and  friends  from  outside  his 
professional  circle.  Only  about  20  of  the  4,051  relationships  in  his  archive  fell 
under  this  category,  resulting  in  a  small  number  of  deletions.  Those  relationships 
were  manually  tagged  and  deleted  before  any  analysis  was  performed. 

In  order  to  take  advantage  of  the  manually  tagged  relationships,  there  was  a 
significant  amount  of  work  necessary  to  ensure  the  data’s  representation  was 
valid.  Occasional  misspellings  were  present,  surname  ambiguities  occurred  over 
time  (e.g.,  folders  named  ‘norman’  in  early  years  versus  folders  named 
‘normandon’  and  ‘normankent’  in  later  years),  and  an  occasional  misstep  from 
naming  conventions  (storing  a  message  from  Catherine  Plaisant  in  a  folder  named 
‘Catherine’  instead  of  ‘plaisant’).  These  findings  are  consistent  with  Ducheaneut 
and  Bellotti,  who  remark  about  users’  confusion  as  to  whether  store  a  message 
from  a  corporate  colleague  in  a  folder  named  after  the  company  or  the  person 
(Ducheaneut  and  Bellotti,  2001).  These  inconsistencies  were  corrected  by  fixing 
typographical  errors  and  standardizing  the  naming  convention  for  relationships 
that  contained  conversations  with  similar  email  addresses. 

Before  our  analysis,  Shneiderman  categorized  each  relationship  into  one  of 
three  distinct  groups.  A  relationship  could  be  tagged  as  a  person,  which  meant 


the  messages  in  that  folder  all  revolved  around  the  relationship  of  a  single  person. 
A  relationship  could  also  be  tagged  as  an  organization,  which  meant  the  messages 
contained  within  that  folder  revolved  around  a  variety  of  individuals  all 
communicating  about  or  within  the  same  organization.  Finally,  the  relationship 
could  be  tagged  as  a  topic,  which  meant  a  variety  of  people  from  one  or  more 
organizations  all  communicating  about  a  similar  topic.  Of  the  4,051 
relationships,  almost  95%  were  tagged  as  people  (3,836),  compared  to  only  197 
organization  relationships  and  18  topics. 

We  should  note  that  our  human-assisted  categorization  methods  are  not  a  strict 
requirement  for  exploring  archives.  For  example,  relationships  could  be 
postulated  automatically  based  on  email  addresses  and/or  message  content. 
However,  the  availability  of  Shneiderman’s  personal  categorization  scheme  gave 
us  comfort  that  we  would  be  analyzing  an  accurate  representation  of  the  corpus, 
reducing  the  noise  present  in  our  rhythms. 

Rhythms  of  Relationships 

By  the  “rhythm  of  a  relationship”  we  mean  the  pattern  of  activity  for  a 
relationship  over  the  duration  of  an  email  archive.  For  example,  in  Figure  2,  two 
relationship  rhythms  are  shown.  The  left  rhythm  depicts  a  relationship  that  was 
inactive  during  the  early  years,  becomes  active  in  the  middle,  and  then  grew  to  be 
an  intense  relationship  in  the  later  years.  Conversely,  the  rhythm  on  the  right 
shows  a  relationship  that  starts  out  intensely  and  then  eventually  dies  down  into 
sporadic  contact.  These  types  of  rhythms  can  be  extracted  from  information  that 
is  present  in  email  headers  alone,  thereby  minimizing  the  need  for  access  to  text 
in  the  bodies  of  the  email  that  would  naturally  be  more  problematic  from  a 
privacy  perspective.  Due  to  our  interest  in  understanding  long-term  patterns,  we 
construct  rhythms  that  have  a  granularity  of  a  year. 


Figure  2.  Examples  of  rhythms  of  relationships. 


Profiles  of  Shneiderman’s  Most  Active  Relationships 

Clearly  not  all  relationships  are  made  equal;  certain  relationships  are  very  intense 
whereas  others  are  quiet  and  infrequent.  In  fact,  about  a  third  (31%)  of 
relationships  in  the  Shneiderman  archive  have  less  than  two  messages  and  55% 
have  less  than  four  messages.  Only  1 1%  of  the  relationships  present  in  the  email 
archive  ever  reach  20  or  more  messages. 

Examining  the  key  relationships  in  an  email  archive  provides  an  understanding 
of  the  nature  of  the  owner’s  work.  Since  the  Shneiderman  archive  consists  of  only 
3,836  individual  relationships,  it  is  likely  that  the  contents  are  tied  to  only  the 
most  valued  relationships.  To  gain  an  understanding  of  the  most  frequent 
correspondents,  we  extracted  the  relationships  with  100  or  more  saved  messages, 
leaving  only  76  professional  relationships. 

These  76  professional  relationships  were  only  2%  of  the  3,836  professional 
relationships,  but  they  produced  12,771  saved  messages  (31%)  out  of  the  41,420 
saved  messages.  The  power  distribution  of  relationships  is  seen  in  Figure  3.  We 
expect  this  distribution  to  be  common  in  email  archives  of  individuals,  with  a 
bulk  of  the  messages  tied  to  a  small  number  of  key  relationships. 


Figure  3.  Power  distribution  of  relationships. 

Having  contact  with  the  archive’s  owner  is  not  a  luxury  we  expect  most 
historians  and  social  scientists  to  have.  However,  we  exploit  our  contact  with 
Shneiderman  to  attain  accounts  of  who  these  76  most  active  relationships  were. 
This  knowledge  is  useful,  as  we  can  judge  our  techniques  against  these  verifiable 
truths.  The  information  provided  by  Shneiderman  is  described  below,  as  it 
provides  insight  into  the  types  of  intense  relationships  that  emerge  in  a  fifteen- 
year  email  archive. 

The  top  ten  most  active  professional  relationships  had  between  240  and  634 
total  messages.  These  relationships  included  four  key  colleagues  at  the 
University  of  Maryland  (Plaisant,  Marchionini,  Norman,  Chimera),  conference 


organizing  partners  (Light,  Soloway,  Rotenberg),  and  collaborators  on  other 
projects  (Simons,  Ahlberg,  Grudin).  These  reflect  Shneiderman’s  major  projects; 
some  with  a  small  number  of  intense  years  of  activity  with  over  140  saved 
messages  (Ahlberg,  Simons,  Light,  Rotenberg),  while  the  rest  show  a  more  steady 
pace  of  exchanges. 

These  76  most  active  relationships  were  relatively  easy  for  Shneidennan  to 
assign  to  categories.  On  a  large  table,  he  created  a  small  card  for  each 
relationship  and  sorted  them  into  clusters.  About  a  dozen  of  the  names  had  more 
than  one  role,  such  as  when  a  University  of  Maryland  colleague  moved  to  another 
university,  a  former  student  became  a  corporate  partner,  or  a  book  editorial 
worker  was  also  a  colleague  at  another  university.  Assignment  was  by  major 
role,  as  detennined  by  the  majority  of  saved  messages  rather  than  duration. 

As  expected,  many  of  the  most  active  professional  relationships  are  from  the 
University  of  Maryland,  with  1 1  being  close  colleagues,  9  being  students,  and  1 1 
others  being  superiors  (chairs,  deans)  and  staff  (secretaries,  administrators). 
Colleagues  at  other  universities  accounted  for  17  of  the  most  active  professional 
relationships,  while  conference  organizing  partners  and  related  efforts  covered  10 
relationships.  Corporate  partners  including  financial  supporters,  consultancies, 
and  book  or  lecture  collaborators  covered  9  relationships. 

Other  important  relationships  included  4  colleagues  tied  to  the  US  ACM  Public 
Policy  group,  in  which  Shneidennan  was  a  member  of  the  Executive  Committee. 
Development  of  Shneidennan’ s  book,  Designing  the  User  Interface:  Strategies 
for  Effective  Human-Computer  Interaction  (Addison-Wesley  Publishers),  showed 
strong  activity  for  3  people  in  the  years  when  the  first  edition  (1986),  second 
edition  (1991),  and  third  edition  (1997)  were  in  production.  Finally,  close 
collaboration  with  2  government  partners  at  the  National  Library  of  Medicine  and 
the  Library  of  Congress  generated  high  levels  of  activity  for  several  years. 


Most  Active  Professional  Relationships 
more  than  100  saved  messages  (n  =  76) 

Number 

Avg. 

Years 

Active 

Avg. 

Total 

Message 

s 

UMD-  Close  colleagues 

11 

9.2 

209.7 

UMD-  Superiors  and  staff 

11 

9.6 

123.0 

UMD-  Students 

9 

9.0 

183.8 

Colleagues  at  other  universities 

17 

11.3 

152.4 

Conference  partners 

10 

8.3 

172.7 

Corporate  partners 

9 

9.1 

137.6 

US  ACM  Public  Policy 

4 

5.5 

252.3 

Book  editorial  workers 

3 

8.7 

183.0 

Government  partners 

2 

9.5 

171.5 

Figure  4.  Shneiderman’s  most  active  relationships,  categorized  by  role. 


Methods  for  Understanding  Email  Archives 

In  this  section,  we  identify  certain  tasks  that  lead  to  insights  by  analyzing  the 
rhythm  of  relationships  in  email  archives.  For  each  task,  we  describe  the 
visualization  methods  that  lead  to  the  insights  and  the  set  of  features  on  which 
that  visualization  is  based.  We  illustrate  the  utility  of  these  analysis  methods  with 
examples  from  the  Shneiderman  archive. 

Evolution  of  Relationships 

With  a  corpus  that  spans  15  years,  it  is  to  be  expected  that  the  nature  of  some 
relationships  will  change  over  that  period.  By  examining  relationships 
individually,  it  is  possible  to  witness  certain  relationships  blossom,  while  other 
relationships  conclude.  However,  when  looking  at  all  the  relationships  together, 
one  might  wonder  what  sorts  of  collective  patterns  emerge:  Did  the  frequency  of 
archived  emails  change  as  email  became  more  ubiquitous?  Are  there  specific 
periods  in  time  when  the  social  circle  changed  more  rapidly  than  others? 
Questions  of  this  type  can  be  answered  with  the  following  approach. 

One  of  the  simplest  analyses  that  can  be  done  is  to  count  the  number  of 
messages  over  time.  Figure  5  illustrates  the  rapid  growth  in  the  number  of 
archived  messages  over  time,  increasing  from  98  emails  in  1984  to  8,499  in  1998. 
Figure  5  also  shows  the  number  of  active  relationships,  counted  for  each  year 
over  the  same  period.  The  growth  in  the  number  of  active  relationships  is  well  fit 
by  linear  interpolation,  while  the  growth  in  the  total  number  of  messages  is  well 
fit  by  a  quadratic  function.  This  archive  spans  a  period  in  which  the  number  of 


Figure  5.  Growth  rates  for  messages  and  relationships. 
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Figure  6.  Over  4,000  relationship  rhythms  superimposed. 

ARPANET/Intemet  users  grew  exponentially,  and  in  that  context,  the  more 
sedate  linear  growth  in  the  number  of  relationships  is  interesting. 

By  counting  the  number  of  messages  and  active  relationships  over  time, 
explorers  can  get  a  sense  of  how  an  email  archive  evolves.  Interesting 
characteristics  can  be  determined,  such  as  if  the  individual  fosters  more 
relationships  over  time  and  if  the  growth  is  consistent  with  the  growth  of  the 
Internet.  The  limitations  to  this  approach  are  that  these  averages  mask 
considerable  individual  variation,  witnessed  in  Figure  6,  which  provides  a 
superimposed  image  of  over  4,000  relationship  rhythms  from  the  archive.  Figure 
6  also  illustrates  a  somewhat  surprising  (and  presently  unexplained)  absence  of 
brief-but-very-intense  relationships  during  the  middle  years  of  the  archive. 


Relationship  Rhythm  Patterns 

Useful  insights  about  relationships  can  be  discovered  based  on  the  pattern  of  its 
rhythm.  For  example,  if  a  historian  was  looking  for  evidence  of  relationships  that 
were  strongly  related  to  a  temporal  event,  a  search  tool  that  could  find 
relationships  that  peaked  around  the  time  of  the  event  might  be  useful.  One  way 
to  support  this  is  by  allowing  the  user  to  sketch  a  graph  to  query  the  time-series,  a 
technique  introduced  in  (Wattenberg,  2001). 

Figure  7  illustrates  an  example  of  this  type  of  search  on  the  Shneidennan 
Archive  using  the  “Hierarchical  Clustering  Explorer”  (HCE)  (Seo  and 
Shneiderman,  2002).  Suppose  the  searcher  postulated  that  Shneiderman’s 
activities  related  to  policy  issues  grew  markedly  in  the  mid- 1 990 ’s.  If  they  had  an 
interest  in  exploring  relationships  that  were  unique  to  that  period,  they  might  then 
construct  a  query  (represented  in  Figure  7  by  a  bold  line),  seeking  relationships 
that  sharply  grew  in  1994,  peaked  in  1995,  and  declined  in  1996.  Rhythms  that 
match  this  query  are  shown  as  thinner  lines.  The  gray  background  provides  a 
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Figure  7.  Searching  an  email  archive  with  a  rhythm  query. 

contour  based  on  most  active  relationships  in  the  corpus  for  each  year.  This 
technique  allows  explorers  to  quickly  find  relationships  that  follow  expected 
patterns.  Of  course,  there  are  also  situations  in  which  a  searcher  may  not  have  a 
specific  question  in  mind  when  they  begin  exploring  an  archive.  In  this  case, 
providing  the  searcher  with  clusters  of  similar  rhythms  might  offer  a  point  of 
departure  for  further  investigation. 

AT-means  Clustering 

Clustering  based  on  similarity  can  be  a  useful  way  of  revealing  characteristic 
rhythms.  Figure  8  shows  the  result  of  clustering  the  76  most  active  relationships 
(i.e.,  those  with  the  largest  total  number  of  messages)  in  the  Shneiderman  Archive 
into  9  clusters.  We  applied  k-means  clustering  (MacQueen,  1967)  to  the  15-year 
rhythms  of  these  active  relationships.  The  number  of  clusters,  k,  is  a  parameter  of 
the  algorithm.  The  k-means  algorithm  then  divides  the  76  rhythms  into  k  clusters 
until  the  total  distance  between  the  rhythms  and  their  cluster’s  centroid  is 
minimized. 

Choosing  an  appropriate  k  is  a  difficult  choice,  especially  for  an  searcher 
unfamiliar  with  the  overall  structure  of  the  rhythms  or  archive.  In  our  initial  run, 
we  asked  the  archive’s  owner,  Shneiderman,  to  group  every  relationship  with 
more  than  100  messages  into  distinct  groups.  By  printing  out  the  names  on  cards, 
and  sorting  the  76  relationships  manually,  he  came  up  with  the  9  distinct  groups 
listed  earlier  in  Figure  4.  It  is  important  to  note  that  these  categories  were  not 
chosen  based  on  rhythm  patterns.  Rather,  groups  were  chosen  based  on  the  roles 
of  the  people  (e.g.  academic  colleague,  corporate  collaborator  or  graduate 
student).  There  was  no  evidence  that  each  of  these  roles  should  constitute  their 
own  rhythm  clusters,  but  it  provided  an  interesting  value  of  k  to  start  with. 
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Figure  8.  Nine  groups  found  using  A-means  time  series  clustering  on  the  76 

most  active  relationships. 

The  A-means  clustering  algorithm  provides  meaningful  results,  as  it 
successfully  displays  similar  patterns,  such  as  those  that  accelerate  in  the  later 
years  (Cluster  2),  relationships  that  start  strong  and  then  die  down  (Cluster  3),  and 
relationships  that  peak  in  similar  years  (Cluster  4).  However,  this  algorithm 
classifies  most  of  the  relationships  into  the  first  cluster,  providing  little  useful 
information  on  that  set.  Selection  of  a  different  number  of  clusters  might  yield 
more  insight  in  those  cases,  but  in  general  users  often  find  a  priori  selection  of 
the  number  of  desired  clusters  to  be  problematic.  Also,  the  clusters  found  had  no 
noticeable  correlation  with  the  clusters  identified  by  Shneiderman  in  Figure  4. 

Hierarchical  Clustering 

Hierarchical  clustering  is  another  algorithm  that  can  group  similar  rhythms,  but 
does  not  require  a  predetermined  number  of  clusters.  Hierarchical  clustering 
works  by  finding  the  pair  of  relationships  with  the  most  similar  rhythms.  It  then 
iteratively  builds  a  hierarchy  by  pairing  these  relationships  with  each  other,  or 
with  a  existing  cluster  of  similar  relationships.  Figure  9  shows  results  of 
hierarchical  clustering  using  HCE  on  all  4,05 1  relationships.  The  hierarchy  that 
HCE  builds  is  shown  using  a  dendrogram,  displayed  in  the  top  panel  of  the  figure. 
Each  subtree  of  the  dendrogram,  alternating  in  gray  and  black,  represents  the 
cluster  of  relationships  that  were  most  intense  in  each  of  the  15  years.  These 
subtrees  are  not  arranged  in  chronological  order,  but  instead  retain  their  order 
from  the  constructed  dendrogram.  These  subtrees  lead  down  to  the  leaves,  where 
each  relationship  is  represented  as  a  column  of  tiles.  Each  tile  in  the  column  is 
shaded  to  correspond  to  that  relationship’s  intensity  in  a  given  year.  In  this 
figure,  gray  shading  means  a  strong  intensity. 


Figure  9.  Hierarchical  clustering  results  on  all  4,051  relationships. 


The  subtree  surrounded  by  a  black  box  at  the  top,  labeled  ‘1988’  and  in  the 
middle  of  the  dendrogram,  represents  those  relationships  that  were  most  intense 
in  1988.  Notice  how  the  tiles  below  this  subtree  have  an  obvious  gray  line  in  the 
fifth  row  of  the  columns  (we  annotate  this  row  with  a  white  arrow  for  clarity). 
That  row  represents  1988  and  the  shading  conveys  the  large  number  of  messages. 
The  rhythm  profiles  that  correspond  to  the  selected  subtree  are  shown  in  the 
bottom  panel,  where  the  intense  activity  in  1988  among  these  relationships  is 
confirmed. 

Hierarchical  clustering  also  detects  groups  of  relationships  that  are  similar 
beyond  one  year.  Subtrees  of  the  dendrogram  isolate  relationships  that  have 
peaks  in  multiple  years.  For  example,  the  algorithm  constructs  a  subtree  for  those 
relationships  that  have  modest  intensity  in  1996,  grow  a  great  deal  in  1997  and 
then  grow  a  little  more  in  1998.  Looking  at  this  cluster’s  list  of  relationships,  the 
four  most  intense  relationships  involving  Ben’s  interest  in  policy  are  found 
(Gelrnan,  Brownstein,  Ellis,  and  Simons).  This  provides  evidence  that  clusters 
can  convey  meaning,  as  the  four  relationships,  remarkably,  can  be  identified 
when  using  HCE  to  zoom  in  on  the  subtree  (as  shown  in  Figure  10,  a  view  which 
shows  only  2%  of  the  entire  tree  structure). 

However,  a  weakness  of  this  approach  is  that  not  all  of  these  clusters  have 
meaning.  For  example,  the  algorithm  finds  three  relationships  that  have  peaks  in 
the  disparate  years  of  1988  and  1994.  After  exploring  deeper  into  the  email 
content,  it  appears  that  is  about  all  these  relationships  have  in  common. 


Aggregating  Related  Rhythms 


In  addition  to  looking  at  the  pattern  of  individual  relationships,  it  is  also  a  useful 
exercise  to  visualize  rhythms  of  related  aggregate  relationships  to  see  trends 
based  on  other  attributes,  such  as  organization  and  location.  For  this  corpus,  we 
generate  the  aggregates  from  information  contained  within  the  email  headers.  For 
each  relationship,  the  most  frequent  email  address  will  represent  that 
relationship’s  attributes.  Of  course,  when  dealing  with  an  individual’s  email 
archive,  all  of  the  addresses  used  by  the  owner  should  be  disregarded.  For  each 
relationship,  we  extract  organization  names  (IBM  from  user@ibm.com), 
organization  type  (educational  from  user@umd.edu  versus  commercial  from 
user@spotfire.com)  and  country  codes  if  present  (Israel  from 
user@technion.ac.il).  With  this  extracted  information,  we  illustrate  some  of  the 
types  of  analysis  that  can  be  performed. 

Although  the  number  of  active  relationships  increases  over  time,  it  became 
clear  that  many  of  Shneidennan’s  emails  were  still  dedicated  to  relationships 
within  his  organization.  Over  the  fifteen-year  period,  24%  of  all  of  his  emails 
were  in  communication  with  relationships  at  his  own  university,  the  University  of 
Maryland.  This  percentage  is  comparable  to  the  total  fraction  of  messages  in 
relationships  with  colleagues  at  other  academics  institutions  (25%)  and  all 
corporations  (23%),  and  double  the  number  of  messages  beyond  the  U.S.  borders 
(12%).  Figure  11  shows  a  plot  of  the  number  of  messages  with  each  type  of 
organization  over  the  fifteen  year  time  period. 

Figure  1 1  also  shows  how  the  contact  base  of  international  contacts  grew  over 
the  fifteen  year  time  period.  As  Shneiderman’s  total  number  of  messages  grew, 
so  did  his  correspondence  with  international  contacts.  Segmenting  the  data  by 
country  allows  us  to  easily  find  the  most  popular  international  relationships.  The 
top  five  countries  are  the  United  Kingdom  (84  relationships),  Canada  (63), 


Figure  10.  A  zoomed-in  view  of  the  dendrogram.  The  four  relationships  related  to 
Shneiderman’s  interest  in  policy  are  denoted  with  triangles  at  the  bottom  of  the 
graphic.  One  of  these  relationships  (Ellis)  is  highlighted. 
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Figure  11.  Aggregate  Rhythms  generated  from  Domain  Names. 

Germany  (39),  Israel  (35)  and  Japan  (31). 

Grouping  relationships  by  country  allows  explorers  to  notice  trends  present  in 
Shneiderman’s  international  rhythms.  Countries  such  as  Germany,  Canada, 
Japan  and  the  United  Kingdom  have  stable  rhythms  throughout  most  of  the 
archive.  However,  there  are  countries  like  Australia,  France  and  Italy  that  only 
grow  towards  the  end  of  the  archive.  Other  distinct  profiles,  like  that  of  Austria 
and  Finland,  peak  in  intensity  towards  the  middle  of  the  archive  and  then  fade  as 
time  goes  on. 

This  approach  allows  explorers  to  find  patterns  and  trends  based  on 
relationships  sharing  similar  attributes.  However,  the  email  address  might  not  be 
an  accurate  representation  of  the  relationship,  thereby  skewing  the  rhythms. 
Furthermore,  individuals  may  change  their  organization  and  location  over  time, 
but  our  method  will  only  assign  the  relationship  its  most  frequent  attributes  over 
the  duration  of  the  archive. 

Collaboration  Rhythms 

One  important  feature  of  email  is  its  ease  of  distributing  messages  to  more  than 
one  person  simultaneously.  This  is  a  typical  activity  when  collaborating  with 
colleagues  and  these  collaborations  are  evidenced  by  email  headers  addressed  to 
multiple  people.  To  gain  insights,  we  construct  collaboration  rhythms:  rhythms 
characterized  by  the  intensity  of  correspondence  between  two  individuals,  besides 
the  archive  owner,  over  time.  Collaboration  rhythms  can  be  constructed  by 
calculating  the  number  of  times  two  unique  people  are  a  part  of  the  same 
conversation  over  the  duration  of  the  archive.  These  rhythms  can  be  generated 
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with  an  0(N“)  algorithm  which  iterates  through  every  email  address  in  the  corpus 
that  doesn’t  belong  to  the  archive  owner,  and  counts  the  number  of  times  it  is  a 
part  of  an  email  (e.g.,  listed  on  the  to/from/cc  lines  of  the  email  header)  with 
every  other  email  address  in  the  corpus. 

When  plotting  the  collaboration  rhythms  of  Shneiderman’s  archive,  some 
interesting  trends  become  evident.  Most  collaborations  seemed  to  last  less  than  a 
year,  and  it  was  rare  for  a  collaboration  to  last  more  than  two  years.  The 
collaboration  rhythms  with  the  most  interesting  patterns  generally  turned  out  to 
be  mailing  lists  (e.g.  a  common  poster  to  a  particular  list),  as  mailing  lists  have 
unique  email  addresses  too.  However,  even  with  these  shortcomings,  it  was  easy 
to  discern  the  top  collaborators  by  glancing  at  the  sharp  peaks  after 
superimposing  all  collaboration  rhythms  into  one  plot.  These  collaborations 
reinforce  the  notion  that  Shneidennan’s  intense  email  relationships  focus  on 
coordination  of  distinct  projects  over  time.  Without  collaboration  rhythms,  it 
would  be  hard  to  get  a  sense  of  the  nature  of  collaborations  between  individuals 
in  the  archive. 

A  limitation  of  this  approach  is  that  if  users  change  their  email  addresses  over 
time,  the  rhythms  will  be  incomplete.  However,  folder  metadata  and  the 
referencing  user’s  full  name  from  the  email  header  could  help  reduce  the  noise  by 
creating  more  robust  identities  of  users. 


Future  Work 

Rhythms  of  relationships  offer  a  class  of  infonnation  that  is  hard  to  discern  from 
keyword  searching  or  reading  the  body  of  the  emails.  However,  our  rhythms  will 
only  answer  a  subset  of  questions  that  searchers  may  have.  Our  research  interests 
are  to  build  on  the  knowledge  gained  in  this  paper,  and  devise  additional  ways 
that  searchers  can  learn  more  about  the  archive. 

One  weakness  of  our  use  of  the  clustering  algorithms  is  that  they  do  not  cluster 
independent  of  time.  For  instance,  if  two  relationships  have  identical  curves  over 
a  time  segment,  but  occur  in  disparate  years  (e.g.  one  rhythm  segment  centers 
around  1989  versus  a  second  rhythm’s  center  of  1996),  our  algorithms  do  not 
consider  them  similar.  Interesting  results  can  emerge  by  finding  similar  peaks 
and  growths,  such  as  determining  if  there  is  a  typical  rhythm  associated  with 
classes  of  people  over  time  (e.g.  a  typical  graduate  student  curve)  or  if  a  certain 
initial  pattern  of  activity  predicts  a  durable  or  intense  relationship. 

The  rhythms  discussed  in  this  paper  use  a  granularity  of  a  year,  which  was 
motivated  by  our  interest  in  understanding  long-term  rhythms.  However,  we 
suspect  different  evidence  will  emerge  if  the  analysis  were  repeated  with  a 
granularity  of  months,  weeks  or  days.  In  the  case  of  Shneiderman,  we  predict 
distinct  trends  of  rhythms  surrounding  academic  semesters,  conferences  and 
weekends. 


Although  we  believe  our  techniques  are  universal,  so  far  they  have  only  been 
tested  on  the  Shneidennan  email  archive.  In  the  future,  we  plan  to  test  these 
methods  on  other  archives  to  see  if  similar  success  is  achievable  on  archives  of 
various  durations  and  sizes. 


Conclusion 

Historians  and  social  scientists  believe  that  email  archives  are  important  artifacts 
for  understanding  the  individuals  and  communities  they  represent.  However, 
there  are  currently  few  methods  or  tools  to  effectively  explore  these  archives. 
This  paper  presents  a  novel  approach  by  analyzing  the  temporal  rhythms  of 
relationships  in  an  email  archive.  By  visualizing  these  rhythms,  important 
relationships  become  evident,  searchers  can  find  patterns  of  interest,  and 
aggregate  trends  can  be  identified.  We  apply  these  techniques  to  the 
Shneidennan  archive,  and  discover  insights  that  may  have  been  otherwise  hidden. 

Rhythms  of  relationships  are  an  innovative  way  to  understand  email  archives. 
However,  the  novel  approach  also  comes  without  rigorous  testing.  More 
evaluation  is  necessary,  but  the  insights  observed  from  the  Shneidennan  archive 
offer  promising  expectations.  We  feel  the  techniques  we  introduce  help  provide 
context  that  is  necessary  for  historians  and  social  scientists  to  make  effective  use 
of  the  archives.  The  number  and  size  of  email  archives  will  undoubtedly  grow  in 
future  years  and  searching  them  will  become  a  more  customary  task.  By 
presenting  new  ways  to  approach  the  exploration  of  email  archives,  not  only  do 
we  provide  a  new  step  for  effective  exploration,  but  also  raise  awareness  for  the 
difficult  task  of  understanding  email  archives. 


Acknowledgments 

We  would  like  to  thank  Susan  Davis,  Danyel  Fisher,  Mara  Hemminger,  Dave  Levin  and  Anthony 
Ramirez  for  their  thoughtful  comments  on  prior  versions  of  the  paper.  We  would  also  like  to 
thank  Jinwook  Seo  for  providing  assistance  with  Hierarchical  Clustering  Explorer.  This  work  has 
been  supported  by  DARPA  cooperative  agreements  N66001002810. 


References 


Baron,  J.  R.  (1999):  ‘Email  Metadata  in  a  Post -Armstrong  World’,  3rd  IEEE  Metadata 

Conference ,  http://www.computer.org/proceedings/meta/1999/papers/83/jbaron.html 


Donath,  J.  (2004):  ‘Visualizing  Email  Archives  (Draft)’, 

http://smg.media.mit.edu/papers/Donath/EmailArchives.draft.pdf 


Ducheneaut,  N.  and  Bellotti,  V.  (2001):  ‘Email  as  habitat:  an  exploration  of  embedded  personal 
information  management’.  Interactions,  vol.  8,  no.  5,  Sept/Oct  2001,  pp.  30-38. 

Grieve,  Tim  (2003):  ‘The  decline  and  fall  of  the  Enron  empire’,  Salon.com,  October  14,  2003. 

Kerr,  Bernard  (2003):  ‘Thread  Arcs:  An  Email  Thread  Visualization’,  2003  IEEE  Symposium  on 
Information  Visualization,  pp.  27. 

Leuski,  A.,  Oard,  D.  W.,  and  Bhagat,  R.  (2003):  ‘eArchivarius:  Accessing  Collections  of 
Electronic  Mail’,  Proceedings  of  the  26th  annual  ACM  SIGIR  Conference,  pp.  468. 

Li,  W.,  Hershkop,  S.  and  Stolfo,  S.  J.,  (2004):  ‘Email  Archive  Analysis  Through  Graphical 

Visualization’,  Proceedings  of  the  2004  ACM  Workshop  on  Visualization  and  Data  Mining 
for  Computer  Security,  pp.  128-132. 

MacKay,  W.  (1988):  ‘More  than  Just  a  Communication  System:  Diversity  in  the  Use  of 
Electronic  Mail’,  Proceedings  of  the  1998  ACM  conference  on  Computer-Supported 
Cooperative  Work,  pp.  344-353. 

MacQueen,  J.B.  (1967):  ‘Some  Methods  for  classification  and  Analysis  of  Multivariate 

Observations’,  Proceedings  of  5'1' Berkeley  Symposium  on  Mathematical  Statistics  and 
Probability",  pp.  281-297. 

Rohall,  S.  L.,  Gruen,  D.,  Moody,  P.,  Wattenberg,  M.,  Stern,  M.,  Kerr,  B.,  Stachel,  B.,  Dave,  K., 
Armes,  R.  and  Wilcox,  E.  (2003):  ‘ReMail:  A  Reinvented  Email  Prototype’,  Proceedings  of 
CHI  2004,  pp  791-792. 

Sack,  W.  (2000):  ‘Discourse  Diagrams:  Interface  Design  for  Very  Large  Scale  Conversations’, 
Proceedings  of  the  33rd  Hawaii  International  Conference  on  System  Sciences,  January  2000, 
p.  3034. 

Seo,  J.  and  Shneiderman,  B.  (2002):  ‘Interactively  Exploring  Hierarchical  Clustering  Results’, 
IEEE  Computer,  vol.  35,  no.  7,  July  2002,  pp.  80-86 

Smith,  M.  (1999):  ‘Invisible  Crowds  in  Cyberspace:  Measuring  and  Mapping  the  Social  Structure 
of  USENET’,  in  Smith,  M.  and  Kollock,  P.  (eds.):  Communities  in  Cyberspace,  Routledge 
Press,  London,  1999. 

Smith,  M.  (2002):  ‘Tools  for  Navigating  Large  Social  Cyberspaces’,  Communications  of  the 
ACM,  vol.  45,  no.  4.,  April  2002,  pp.  51-55. 

Tyler,  J.  R.  and  Tang,  J.  C.  (2003):  ‘When  Can  I  Expect  an  Email  Response?  A  Study  of  Rhythms 
in  Email  Usage’,  Proceedings  ofECSCW 2003. 

Tyler,  J.  R.,  Wilkinson,  D.  M.,  Huberman,  B.  A.  (2003):  ‘Email  as  Spectroscopy:  Automated 
Discovery  of  Community  Structure  within  Organizations’,  Communities  and  Technologies, 
pp.  81-96. 


Venolia,  G.  and  Neustaedter,  C.  (2003):  ‘Understanding  Sequence  and  Reply  Relationships  within 
Email  Conversations:  A  Mixed-Model  Visualization’,  Proceedings  of  CHI  2003,  pp.  361- 
368 

Viegas,  F.,  boyd,  d.,  Nguyen,  D.,  Potter,  J.,  Donath,  J.  (2004):  ‘Digital  Artifacts  for  Remembering 
and  Storytelling:  PosfHistory  and  Social  Network  Fragments’,  Proceedings  of  the  37th 
Hawaii  International  Conference  on  System  Sciences,  January  2004,  pp.  40109a. 

Wattenberg,  M.  (2001):  ‘Sketching  a  graph  to  query  a  timeseries  database’,  Proceedings  of  CHI 
2001,  pp.  379-380. 

Whittaker,  S.  and  Sidner,  C.  (1996):  ‘Email  overload:  exploring  personal  information 
management  of  email’.  Proceedings  of  CHI  1996,  pp.  276-283. 


